Automatic segmentation of speakers in broadcast audio material

نویسندگان

  • Hyoung-Gook Kim
  • Thomas Sikora
چکیده

In this paper, dimension-reduced, decorrelated spectral features for general sound recognition are applied to segment conversational speech of both broadcast news audio and panel discussion television programs. Without a priori information about number of speakers, the audio stream is segmented by a hybrid metric-based and model-based segmentation algorithm. For the measure of the performance we compare the segmentation results of the hybrid method versus metric-based segmentation with both the MPEG-7 standardized features and Mel-scale Frequency Cepstrum Coefficients (MFCC). Results show that the MFCC features yield better performance compared to MPEG7 features. The hybrid approach significantly outperforms direct metric based segmentation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Remes Speaker - Based Segmentation and Adaptation in Automatic Speech Recognition

With proper training, automatic speech recognition works quite well when tested in conditions similar to the training conditions, but with a new speaker or a new environment the system performance often degrades. Speaker-based adaptation alters the speech recognition system to better match a specific speaker and thus improves the speech recognition results. In order to use speaker adaptation, t...

متن کامل

Segmentation, Classification and Clustering of an Italian Broadcast News Corpus

This work reports on preliminary activity at ITC-irst on the problem of acoustic segmentation, classification and clustering of an Italian audio broadcast news corpus. The approach is based on the following stages. First, the input data stream is segmented by detecting spectral changes through the Bayesian Information Criterion (BIC). Second, segments are classified in terms of acoustic conditi...

متن کامل

Automatic Segmentation , Classification and Clustering of Broadcast News Audio

Automatic recognition of broadcast feeds from radio and television sources has been gaining importance recently, especially with the success of systems such as the CMU Informedia system [1]. In this work we describe the problems faced in adapting a system built to recognize one utterance at a time to a task that requires recognition of an entire half hour show. We break the problem into three c...

متن کامل

Speaker tracking in a broadcast news corpus

Speaker tracking is the process of following who says something in an audio stream. In the case the audio stream is a recording of broadcast news, speaker identity can be an important meta-data for building digital libraries. Moreover, the segmentation and classification of the audio stream in terms of acoustic contents, bandwidth and speaker gender allow to filter out portions of the signal wh...

متن کامل

Robust Unsupervised Speaker Segmentation for Audio Diarization

Audio diarization Reynolds & Carrasquillo (2005) is the process of partitioning an input audio stream into homogeneous regions according to their specific audio sources. These sources can include audio type (speech, music, background noise, ect.), speaker identity and channel characteristics. With the continually increasing number of larges volumes of spoken documents including broadcasts, voic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004